Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Engine Refactor Proposal #3752

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

ashwinvaidya17
Copy link
Contributor

@ashwinvaidya17 ashwinvaidya17 commented Jul 19, 2024

TL;DR

This PoC aims to propose and discuss refactoring of the current OTX Engine. It will help make the task more flexible, reduce code duplication, and help directly expose all features of the upstream framework. The code changes here are at a very high level.

Intro

This is a very limited PoC (think of it like pseudo-code) to get the discussions rolling. It is not as extensive in features as what Harim had made last year (https://github.com/harimkang/training_extensions/tree/harimkan/feature-v2-ultralytics/src/otx/v2/adapters/torch/ultralytics)
The main reason is that Anomalib Engine does a lot of things under-the-hood which would require major effort when implementing in OTX. I'll share the presentation in Teams. It would be nice to get feedback on this to flesh it out further. Also, we can schedule 1:1s to discuss any concerns and modify accordingly.

Motivation

  • Currently, the Anomaly task is limited, and impacted by the requirement of moulding it to match OTX Model, Engine, and DataLoader. This involves duplication of code, re-implementing features in Anomalib, and complex inheritance hierarchy. This makes it harder for maintenance, and can lead to divergence of the same features across repos.
  • Anomalib Engine also internally takes care of attaching the right post/pre-processing callbacks, modifying the training loop based on the learning type, setting up model-specific transforms, and update dataloaders based on the task type. Supporting these in OTX without Anomalib's Engine will require a lot of effort and some complex logic.
  • From Geti and OTX perspective, the only output artefacts we care about are the metrics produced, input dataset, and output annotations. So, this means that till the time we have the right converters, internally, the tasks have more flexibility on the representation they use for models and dataset. And, if we directly pass the trained model weights to the upstream framework, and produce a standard result, we don't need the models to conform to a specific format.

Approach

  • The Engine can remain as-is and will work like AutoEngine as proposed by Harim in his initial design. see: engine_poc.py
  • The contents of the current engine can be extracted to a separate adapter that trains the current OTX model.
  • Another adapter for Anomalib can be introduced along with the AnomalibEngine to directly use the Anomalib's own engine for training.
  • Add datumaro adapter in Anomalib so it can ingest dataset from OTX and Geti.
  • Anomalib, OTX, and modelAPI will require updating so that the OpenVINO models produced by both the frameworks are the same, and can use the same inferencers.

Outcome

  • This will make it easier to expose all the models, and features of Anomalib such as Zero-shot, Few-shot, and LLM based models, as well as synthetic data generation.
  • Since, the underlying models will remain same for Anomalib and OTX, models trained on one framework will be interchangeable with the other.
  • Further, we can unify the OpenVINO models exported from Anomalib and OTX so that they are interchangeable.

API Example

Single Engine as entrypoint for all backends.

Allows us to not only directly use all the models of the other framework but also allows us to use its dataloaders.

from otx.engine.engine_poc import Engine
from anomalib.data import MVTec
from anomalib.models import Padim

engine = Engine()
model = Padim()
datamodule = MVTec(root="./datasets/MVTec")
engine.train(model, datamodule)

Ultralytics example just to show that we can extend it to other frameworks.

model = YOLO("yolov8n.pt")
engine = Engine()
dataset = ClassificationDataset(root="./datasets/hazelnut_toy", args=DEFAULT_CFG)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
engine.train(model=model, datamodule=dataloader)

Current OTX models are compatible and use the same entry-points.

from otx.algo.classification.mobilenet_v3 import MobileNetV3ForMulticlassCls
from otx.core.config.data import SubsetConfig
from otx.core.data import OTXDataModule

datamodule = OTXDataModule(
	task=OTXTaskType.MULTI_CLASS_CLS,
	data_format="cifar",
	data_root="./datasets/cifar/cifar-10-batches-py",
	train_subset=SubsetConfig(
		batch_size=64,
		subset_name="data_batch_1",
		transform_lib_type="TORCHVISION",
		transforms=simple_transforms,
	),
	val_subset=SubsetConfig(
		batch_size=64,
		subset_name="test_batch",
		transform_lib_type="TORCHVISION",
		transforms=simple_transforms,
	),
	test_subset=SubsetConfig(
		batch_size=64,
		subset_name="test_batch",
		transform_lib_type="TORCHVISION",
		transforms=simple_transforms,
	),
)
model = MobileNetV3ForMulticlassCls(
	label_info=datamodule.label_info,
)
engine = Engine()
engine.train(model, datamodule, max_epochs=1, deterministic=False, val_check_interval=0.5)

CLI Registration

If the Engine class is decorated correctly to accept models from different frameworks, we can directly expose these models to the cli

class Engine:
	...
	def train(
		self,
		model: OTXModel | AnomalyModule | UltralyticsModel,
		...
	) -> METRICS
class CLI:
	def add_subcommands(self)->None:
		...
		subparser.add_method_arguments(Engine, "train")
		...

Models from the upstream repos are automatically exposed, and can work with OTX

python cli_poc.py train --help
...
Train the model:
  --model.help CLASS_PATH_OR_NAME
                        Show the help for the given subclass of {OTXModel,AnomalyModule,Model} and exit.
  --model MODEL         (required, type: OTXModel | AnomalyModule | Model, known subclasses: otx.core.model.base.OTXModel, otx.core.model.base.OVModel, anomalib.models.Cfa, anomalib.models.Cflow, anomalib.models.Csflow, anomalib.models.Dfkde,
                        anomalib.models.Dfm, anomalib.models.Draem, anomalib.models.Dsr, anomalib.models.EfficientAd, anomalib.models.Fastflow, anomalib.models.Fre, anomalib.models.Ganomaly, anomalib.models.Padim, anomalib.models.Patchcore,
                        anomalib.models.ReverseDistillation, anomalib.models.Rkde, anomalib.models.Stfpm, anomalib.models.Uflow, anomalib.models.WinClip, anomalib.models.AiVad, ultralytics.engine.model.Model, ultralytics.YOLO,
                        ultralytics.YOLOWorld, ultralytics.FastSAM, ultralytics.NAS, ultralytics.RTDETR, ultralytics.SAM)
...

Signed-off-by: Ashwin Vaidya <[email protected]>
Copy link
Contributor

@harimkang harimkang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pulling back my initial design and reviewing it. I believe there are pros and cons to both the current and past designs, and while the this PR's design maximizes scalability, but This looks quite different from Engine's current policy. There still seems to be a lot to discuss.

  • AutoEngine is not the right name in my opinion, I think it would be better to keep Engine but define each Backend's SubEngine as **Engine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the main focus of this PR is refactoring for Engine-related interfaces and scalability.
Currently, our CLI already has the functionality of this code, so if this PoC is going to merge into develop, we need to look at compatibility with our existing CLI code.

I think the following discussion should be made to the existing CLI code with this changes.

  1. We need to make sure that our configuration allows us to receive DataModule other than OTXDataModule. -> I think this is possible with some tweaks to add_argument.
  2. We need to think about how to build Dataset from external frameworks other than DataModule in the CLI.

Anyway, for the CLI part, I think it's better to wait until Engine's design is finalized and see if there are any issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the main intention of this PR is to propose the change to the Engine class. I included CLI for completeness, and also to ensure that the new design does not break it.

from otx.engine.base import METRICS, Adapter


def wrap_to_anomalib_datamodule(datamodule: OTXDataModule) -> AnomalibDataModule:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this?
Can't we just use anomalib.data.Folder?
(like https://github.com/openvinotoolkit/anomalib/blob/main/configs/data/folder.yaml)


class AnomalibAdapter(Adapter):
def __init__(self):
self._engine = AnomalibEngine()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like the Engine is holding the Adapter and the Adapter is holding the Engine in turn. Even if they have the same name, wouldn't it be better to use a different name to distinguish them (e.g. trainer)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the naming is super tricky here. The problem is that internally, Anomalib Engine also has a trainer. So currently, it is Engine->Adapter->Engine (anomalib)->Trainer. Which isn't the best. I am open to alternatives.

Comment on lines +50 to +55
def train(
self,
model: OTXModel | AnomalyModule | Model,
datamodule: OTXDataModule | AnomalibDataModule | UltralyticsDataset,
**kwargs,
) -> METRICS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the arguments for each function in Engine are very well thought out (per each commands), so I think we should try to keep the existing arguments as much as possible.
the guiding principle of the OTX Engine is 1 Model = 1 Engine. -> I don't currently see any reason to break this principle.
Therefore, I think it is correct to go to __init__ to receive the model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def train(
        self,
        max_epochs: int = 10,
        seed: int | None = None,
        deterministic: bool | Literal["warn"] = False,
        precision: _PRECISION_INPUT | None = "32",
        callbacks: list[Callback] | Callback | None = None,
        logger: Logger | Iterable[Logger] | bool | None = None,
        resume: bool = False,
        metric: MetricCallable | None = None,
        run_hpo: bool = False,
        hpo_config: HpoConfig = HpoConfig(),  # noqa: B008 https://github.com/omni-us/jsonargparse/issues/423
        checkpoint: PathLike | None = None,
        adaptive_bs: Literal["None", "Safe", "Full"] = "None",
        **kwargs,
    ) -> dict[str, Any]:

I like the current way of doing things better, where we clearly define what we need for each command as an argument to each function, as shown above. Also, I don't see any reason to change the model through the function.

Current OTX: Engine is created (Model is determined) -> E2E flow is used as this model.
This PR's Design: There's room for the Model to change with every command. -> Do we need this use-case?
I believe the above discussion was discussed and decided upon in the first Engine design proposal, and I don't see any reason to change it at this time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I prefer explicit arguments as well. I didn't include all of them here as I wanted to focus on the adapter. I think we should have explicit arguments also so that these will be serialised by the CLI when creating the YAML file. Maybe this will get polished after a few iterations of discussions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see, and the reason I commented was that I thought it would be a good idea to avoid breaking existing Engine Usage as much as possible.

def adapter(self, adapter: adapter) -> None:
self._adapter = adapter

def get_adapter(self, model: OTXModel | AnomalyModule | Model) -> adapter:
Copy link
Contributor

@harimkang harimkang Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird to separate adapters into models. I think it would be better to be able to distinguish between adapters and models and have them be received as Engine's __init__, but what do you think?
-> I believe that Adapters and Models should be somewhat independent.

def __init__(
        self,
        ...
        backends = Literal["base", "anomalib", "..."] = "base",
...
    ) -> None:

logger = logging.getLogger(__name__)


class Engine:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the current structure is an implementation that doesn't take auto-configuration into account at all. Is auto-configuration considered in the design?
In my opinion, Auto-Configuration is one of OTX Key-Features.

)


class UltralyticsTrainer(BaseTrainer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this class separate from Adapter?

Copy link
Contributor

@harimkang harimkang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid any mistakes, I'm going to Request Changes to this PR first.

@ashwinvaidya17
Copy link
Contributor Author

@harimkang an alternative to Adapter can be some sort of factory class. Since we pass model to the __init__ method. Here is a different proposal

from abc import ABC, abstractmethod
from typing import Any

from anomalib.models import AnomalyModule
from lightning.pytorch import LightningModule


class BaseEngine(ABC):
    @abstractmethod
    def train(self, **kwargs):
        pass

    @classmethod
    @abstractmethod
    def is_valid_model(cls, model: Any) -> bool:
        pass


class EngineRegistry:
    engines = set()

    @classmethod
    def register(cls):
        def decorator(engine_class: BaseEngine):
            cls.engines.add(engine_class)
            return engine_class

        return decorator


@EngineRegistry.register()
class AnomalibEngine(BaseEngine):
    def __init__(self, model: AnomalyModule, **kwargs):
        print("Anomalib Engine")

    def train(self, **kwargs):
        pass

    @classmethod
    def is_valid_model(cls, model: Any) -> bool:
        return isinstance(model, AnomalyModule)


@EngineRegistry.register()
class LightningEngine(BaseEngine):
    def __init__(self, model: LightningModule, **kwargs):
        print("Lightning Engine")

    def train(self, **kwargs):
        pass

    @classmethod
    def is_valid_model(cls, model: Any) -> bool:
        return isinstance(model, LightningModule)

Then we can initialise the right engine something like this. Here Engine serves as AutoEngine but acts like a factory class.

class Engine:
    def __new__(
        cls, model: Any, **kwargs
    ) -> BaseEngine:  # pass all init parameters to this
        for engine in EngineRegistry.engines:
            if engine.is_valid_model(model):
                return engine(model, **kwargs)

And to test

class MockAnomalyModule(AnomalyModule):
    def __init__(self):
        pass

    def learning_type(self):
        pass

    def trainer_arguments(self):
        pass


class MockLightningModule(LightningModule):
    def __init__(self):
        pass


if __name__ == "__main__":
    model = MockAnomalyModule()
    engine = Engine(model)
    assert isinstance(engine, AnomalibEngine)

    model = MockLightningModule()
    engine = Engine(model)
    assert isinstance(engine, LightningEngine)

Copy link
Contributor

@wonjuleee wonjuleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@wonjuleee wonjuleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify what components are included in anomalib trainer so as to customize OTX engine?

@ashwinvaidya17
Copy link
Contributor Author

@wonjuleee good question. The separation of torch model in Anomalib was something we had designed initially to make the model export easier. However, over time the export mechanism has changed in Anomalib, and we are looking into embedding the post-processing in the exported model. Additionally, the Anomalib lightning model stores certain metadata that is necessary during different stages of the pipeline.

https://github.com/openvinotoolkit/anomalib/blob/d1f824a5798262891dbbe583fb291e1cf9aa7d2a/src/anomalib/models/components/base/anomaly_module.py#L49-57

These itself are configured from callbacks in the engine. The callbacks are hidden here so that the users do not modify them inadvertently.
https://github.com/openvinotoolkit/anomalib/blob/d1f824a5798262891dbbe583fb291e1cf9aa7d2a/src/anomalib/engine/engine.py#L408

While current OTX Anomaly models inherit from Anomalib's models (like Padim), they also need these necessary callbacks to work. While these are present in the task, they are quite limited and cannot be configured.

The lightning model also stores model specific transforms that are used to configure the datamodule if required.
https://github.com/openvinotoolkit/anomalib/blob/d1f824a5798262891dbbe583fb291e1cf9aa7d2a/src/anomalib/engine/engine.py#L398

Hope this clarifies your question.

@sovrasov sovrasov removed the OTX 2.0 label Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants